Model Selection

Multilingual Speech Synthesis

# Multilingual Speech Synthesis

Llasa is a text-to-speech (TTS) system based on LLaMA, which extends the capabilities of the language model by integrating speech tokens, supporting Chinese and English speech generation.

Speech Synthesis Supports Multiple Languages

Bark is a Transformer-based text-to-audio model created by Suno, capable of generating highly realistic multilingual speech, music, background noise, and sound effects.

Speech Synthesis Supports Multiple Languages

IndicF5 is a near-human level multilingual text-to-speech (TTS) model supporting 11 Indian languages.

Speech Synthesis Other

viⓍTTS is a voice generation model capable of cloning voices into different languages using a 6-second short audio clip.

Speech Synthesis Other

IndicF5 is a near-human multilingual text-to-speech (TTS) model trained on 1,417 hours of high-quality speech data, supporting 11 Indian languages.

Speech Synthesis Other

Speecht5 Finetuned Voxpopuli Lt

A text-to-speech model fine-tuned on the VoxPopuli dataset based on microsoft/speecht5_tts

Speech Synthesis

Kokoro is an open-source TTS model with 82 million parameters, delivering audio quality comparable to larger models while offering significant speed advantages and cost efficiency.

Speech Synthesis English

YarnGPT2 is a text-to-speech (TTS) model specifically designed for synthesizing Nigerian-accented languages (Yoruba, Igbo, Hausa, and English).

Speech Synthesis

Transformers English

Cosyvoice2 0.5B

CosyVoice is a text-to-speech (TTS) model that supports multilingual and voice conversion capabilities, providing high-quality speech synthesis.

Speech Synthesis

Parler Tts Mini Multilingual V1.1

Parler-TTS Mini Multilingual v1.1 is a multilingual extension based on the Parler-TTS Mini version, supporting text-to-speech in 8 European languages.

Speech Synthesis

Transformers Supports Multiple Languages

Indri 0.1 350m Tts

Indri is a novel, ultra-small, lightweight TTS model based on the Transformer architecture, supporting text-to-speech tasks in English and Hindi.

Speech Synthesis

Transformers Supports Multiple Languages

GPT SoVITS V1 Base

GPT-SoVITS (V1) is a multilingual text-to-speech foundation model supporting Chinese, English, and Japanese.

Speech Synthesis Supports Multiple Languages

Indic Parler Tts Pretrained

The Indic Parler-TTS Pretrained Model is a multilingual Indian language extension of Parler-TTS Mini, supporting 21 languages, including various Indian languages and English.

Speech Synthesis

Transformers Supports Multiple Languages

Indic Parler Tts

Indic Parler-TTS is a multilingual extension of Parler-TTS Mini, supporting 21 languages including various Indian languages and English.

Speech Synthesis

Transformers Supports Multiple Languages

This is a Transformers-based Text-to-Speech (TTS) model capable of converting input text into natural speech output.

Speech Synthesis

Cosyvoice 300M SFT

CosyVoice is a text-to-speech (TTS) model that supports multilingual and multi-style voice synthesis.

Speech Synthesis

Speecht5 Tts Urdu

A Urdu text-to-speech model fine-tuned on Microsoft's SpeechT5 architecture, supporting Romanized input

Speech Synthesis

Transformers Other

viⓍTTS is a voice generation model that supports voice cloning in 18 languages, with special optimization for Vietnamese.

Speech Synthesis

Transformers Other

Mms Tts Tuk Script Latin

A Turkmen text-to-speech model developed by Meta, part of the Massively Multilingual Speech project, supporting speech synthesis for Turkmen written in Latin script.

Speech Synthesis

Catalan text-to-speech model developed by Meta, utilizing the VITS end-to-end architecture for high-quality speech synthesis

Speech Synthesis

Bengali text-to-speech model developed by Meta, based on the VITS architecture, supporting high-quality speech synthesis

Speech Synthesis

Bemba (bem) text-to-speech model developed by Meta, part of the Massively Multilingual Speech project

Speech Synthesis

A Somali text-to-speech model developed by Meta as part of the MMS project, supporting the conversion of Somali text into natural speech.

Speech Synthesis

Kekchi text-to-speech model developed by Meta, part of the Massively Multilingual Speech project

Speech Synthesis

Odia text-to-speech model from Facebook's MMS project, achieving high-quality speech synthesis based on the VITS architecture

Speech Synthesis

Latin text-to-speech model developed by Meta, based on VITS architecture, supporting high-quality speech synthesis

Speech Synthesis

Romanian text-to-speech model developed by Meta, utilizing VITS architecture for high-quality speech synthesis

Speech Synthesis

An end-to-end text-to-speech model for Tagalog developed by Meta, based on the VITS architecture, supporting high-quality speech synthesis

Speech Synthesis

An Ese Ehue text-to-speech model developed by Meta as part of the Massively Multilingual Speech project, supporting high-quality speech synthesis.

Speech Synthesis

Telugu text-to-speech model developed by Meta, based on the VITS architecture, supporting high-quality speech synthesis

Speech Synthesis

Facebook's Massively Multilingual Speech project's Tamil text-to-speech model, implementing high-quality speech synthesis based on the VITS architecture

Speech Synthesis

Marathi text-to-speech model developed by Meta, supporting high-quality speech synthesis

Speech Synthesis

A Hindi-Fijian text-to-speech model developed by Meta, based on the VITS architecture, supporting high-quality speech synthesis

Speech Synthesis

Albanian text-to-speech model developed by Meta, based on VITS architecture, supporting high-quality speech synthesis

Speech Synthesis

Speecht5 Finetuned Common Voice Be

Belarusian text-to-speech model based on Microsoft SpeechT5 architecture, fine-tuned on the Common Voice dataset

Speech Synthesis

Transformers Other

Bark is a Transformer-based text-to-audio model created by Suno, capable of generating highly realistic multilingual speech, music, background noise, and simple sound effects.

Speech Synthesis

Transformers Supports Multiple Languages

Silero Model V3 Ru

Silero Speech Model is a text-to-speech (TTS) model focused on Russian, developed and open-sourced by snakers4.

Speech Synthesis

Transformers Other

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase